Centre for Advanced Spatial Analysis, University College London

Professor of Urban Analytics & Current Head of Department @ Bartlett Centre for Advanced Spatial Analysis (CASA), UCL
Geographer by background - ex-Secondary School Teacher - back in HE for 15+ years
Taught GIS / Spatial Data Science at postgrad level for last 10 years
Whistle-stop tour of some of the key concepts relating to spatial data
An illustrative example analysing some spatial data in London - demonstrating the “spatial is special” idiom and how we might account for spatial factors in our analysis
I hope you’ll all leave with a better understanding of how we should pay attention to the influence of space in any analysis
Everything happens somewhere

More reliable than names (that are rarely unique or reference fuzzy locations), are coordinates
The earth is roughly spherical and points anywhere on its surface can be described using the World Geodetic System (WGS) - a geographic (spherical) coordinate system
Points can be referenced according to their position on a grid of latitudes (degrees north or south of the equator) and longitudes (degrees east or west of the Prime - Greenwich - meridian)
The last major revision of the World Geodetic System was in 1984 and WGS84 is still used today as the standard system for references places on the globe.
Projected Coordinate Reference Systems convert the 3D globe to a 2D plane and can do so in a huge variety of different ways
Most national mapping agencies have their own projected coordinate systems - in Britain the Ordnance Survey maintain the British National Grid which locates places according to 6-digit Easting and Northing coordinates
Every coordinate system can be referenced by its EPSG code, e.g. WGS84 = 4326 or British National Grid = 27700 with mathematical transformations to convert between them

Once we have a coordinate reference system we can locate objects accurately in space
Most objects that spatial data scientists are concerned with (apart from gridded representations, which we will ignore for now!) can be simplified to either a point, a line or a polygon in that space
Polygons and lines are just multiple point coordinates joined together!
“Everything is related to everything else, but near things are more related than distant things.”

This observation underpins much of what spatial data scientists do
Being able to locate something in space allows us to:
explain why something may be occurring where it is
make better predictions about nearby or further away things
Underpins the whole Geodeomographics (customer segmentation) industry!!

Near and distant can mean different things in different contexts
In spatial data science one way of separating near from distant can simply be to define their topological relationship - Dimensionally Extended 9-Intersection Model (DE-9IM) is the standard topological model used in GIS
Touching or overlapping objects = ‘near’



Other conceptions of near might include any contiguous ward with distant simply being those which are not contiguous
Near or distant could also be defined by some distance threshold

Is there any pattern? Do better scores and worse scores appear to be clustered? How can we tell?
Spatial Autocorrelation - phenomenon of near things being more similar than distant things.
Can test for spatial autocorrelation by comparing the GCSE Scores in any given ward with the GCSE scores in neighbouring wards (however we choose to define our neighbours - k-nearest, those that are contiguous etc.)
Average value of GCSE scores in the neighbouring wards is known as the spatial lag of GSCE scores
(Intercept) average_gcse_capped_point_scores_2014
190.2624075 0.4190508
Moran I test under randomisation
data: LondonWardsMerged$average_gcse_capped_point_scores_2014
weights: nb2listw(LWard_nb)
Moran I statistic standard deviate = 17.785, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.4190507533 -0.0016025641 0.0005594495
Moran’s I = 0.42
Moderate, positive spatial autocorrelation between average GCSE scores in London - some clustering of both low and high scores
Spatial Autocorrelation might be expected when distribution of schools overlaid and one realises that pupils from multiple neighbouring wards might attend the same school
Having observed some spatial patterns in school exam performance in London, we might next want to explain these patterns, perhaps using another variable measured for the same spatial units.
Our own experience might tell us that missing class could negatively impact our ability to learn things in that class
Hypothesis: wards where there are higher rates of absence from school might tend to experience lower average exam grades
(Intercept)
371.71500
unauthorised_absence_in_all_schools_percent_2013
-41.40264
Taking the whole of London, it would appear that there is a moderately strong, negative relationship between missing school and exam performance
For every 1% of additional school days missed, we might expect a decrease of -41 points in GCSE score.
But does this relationship hold true across all wards in the city?
Moran’s I of GSCE scores means that we already know that the observations are probably not independent of each other (violating one assumption of regression)
Mapping the residual values from the regression model allows us to observe any spatial clustering in the errors
Clustering of residuals could also indicate a violation of the independence assumption of errors
Moran I test under randomisation
data: LondonWardsMerged$model1_resids
weights: nb2listw(LWard_nb)
Moran I statistic standard deviate = 12.183, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.2862894906 -0.0016025641 0.0005583971
One way of coping with spatial dependence in the dependent variable is to include the spatial lag of that variable as an independent explanatory variable
Running the spatial lag model reveals that the spatial lag is statistically significant and has the effect of reducing the estimated impact of missing 1% of schools days from -42 points to -31 points.
Call:
lagsarlm(formula = average_gcse_capped_point_scores_2014 ~ unauthorised_absence_in_all_schools_percent_2013,
data = LondonWardsMerged, listw = nb2listw(LWard_nb, style = "W"),
method = "eigen")
Residuals:
Min 1Q Median 3Q Max
-68.70402 -9.44615 -0.64207 8.53417 58.56788
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value
(Intercept) 207.4009 15.0053 13.822
unauthorised_absence_in_all_schools_percent_2013 -30.7843 2.0792 -14.806
Pr(>|z|)
(Intercept) < 2.2e-16
unauthorised_absence_in_all_schools_percent_2013 < 2.2e-16
Rho: 0.46705, LR test value: 104.93, p-value: < 2.22e-16
Asymptotic standard error: 0.041738
z-value: 11.19, p-value: < 2.22e-16
Wald statistic: 125.22, p-value: < 2.22e-16
Log likelihood: -2581.93 for lag model
ML residual variance (sigma squared): 217.21, (sigma: 14.738)
Number of observations: 625
Number of parameters estimated: 4
AIC: 5171.9, (AIC for lm: 5274.8)
LM test for residual autocorrelation
test value: 3.0949, p-value: 0.078537
One reason behind a clustering of residuals could be that the relationship between dependent and independent variables might not remain constant across space
In some parts of London, it could be that as unauthorised absence from school rises, exam grades also rise (as unlikely as that might be!).
Or, more plausibly, that in some parts of the city, absence has an even more pronounced negative effect than in others.
It’s also likely that the intercept values (the average value of GSCE rules, given no days of unauthorised absence) will be different in different parts of the city - some areas, on average, doing better than others
We can test for the presence of such phenomena by running a series of smaller, more localised regressions and comparing the coefficients that emerge
GWR is a method for systematically running a series of localised regression analyses across a study area, collecting coefficients and other diagnostics for each zone of interest.
Something similar can be achieved through spatial sub-setting - i.e. running analyses for groups of zones within a higher level geography

In a GWR analysis, kernel weighting functions of different bandwidths (diameters) and shapes are used to include and weight or exclude neighbouring observations
Adaptive weighting can be used to adjust the size of the kernel according to some threshold of observations
For every point in the dataset a regression is run including the values within the kernel (which, of course, can only be achieved effectively through understanding the coordinate reference system of the observations)
Plotting coefficient values for each ward reveals noticable non-stationarity in the relationship between absence and GSCE scores
In well-off central London boroughs (particularly Hammersmith and Fulham, Kensington and Chelsea and Camden) we see evidence that absence is positively related to GCSE performance
In some of the outer-London boroughs (Barnet, Sutton, Richmond etc.) the effect of missing school is even more severe than it is elsewhere in the city
Methods which accommodate space explicitly can help us better understand spatial phenomomena
However, just because patterns can be obeserved at one scale, level of aggregation or for one particular arrangement of spatial units, it is not necessarily the case that these patterns will hold or even be relevant should the space be arranged differently
Politicians have known about the issues of scale and aggregation for a long time
The practice of Gerrymandering is widespread wherever there is a first-past-the-post electoral system
